AITopics | policy network

Conditioning Matters: Training Diffusion Policies is Faster Than You Think

Neural Information Processing SystemsJun-21-2026, 18:15:32 GMT

Diffusion policies have emerged as a mainstream paradigm for building visionlanguage-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution, a phenomenon we term loss collapse. To overcome this, we propose Cocos, a simple yet general solution that modifies the source distribution in the conditional flow matching to be condition-dependent. By anchoring the source distribution around semantics extracted from condition inputs, Cocos encourages stronger condition integration and prevents the loss collapse. We provide theoretical justification and extensive empirical results across simulation and real-world benchmarks. Our method achieves faster convergence and higher success rates than existing approaches, matching the performance of large-scale pre-trained VLAs using significantly fewer gradient steps and parameters. Cocos is lightweight, easy to implement, and compatible with diverse policy architectures, offering a general-purpose improvement to diffusion policy training.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
(2 more...)

Add feedback

fb7f55f36c53247a704792a721272706-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 09:28:10 GMT

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

ASimple Decentralized Cross-Entropy Method

Neural Information Processing SystemsApr-28-2026, 07:27:09 GMT

Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-k operations' results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the stateof-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis.

artificial intelligence, machine learning, policy network, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Material

Neural Information Processing SystemsApr-25-2026, 23:03:53 GMT

Then each deterministic NN in {πw,b | (w,b) Wπ}is safe if and only if the system of constraints Φ(π,X0,Xu,) is not satisfiable. We prove the equivalent claim that there exists a weight vector (w,b) Wπ for which πw,b is unsafe if and only if Φ(π,X0,Xu,) is satisfiable. First, suppose that there exists a weight vector (w,b) Wπ for which πw,b is unsafe and we want to show that Φ(π,X0,Xu,) is satisfiable. This direction of the proof is straightforward since values of the network's neurons on the unsafe input give rise to a solution of Φ(π,X0,Xu,). Indeed, by assumption there exists a vector of input neuron values x0 X0 for which the corresponding vector of output neuron values xl = πw,b(x0) is unsafe, i.e. xl Xu.

artificial intelligence, machine learning, vector, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

525b8410cc8612283c9ecaf9a319f8ed-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 22:18:28 GMT

The gray agent consistently chooses the cyan object over the yellow object (a) . The same gray agent moves to the preferred cyan object (b).

artificial intelligence, machine learning, modeling & simulation, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

41da609c519d77b29be442f8c1105647-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 15:12:29 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

example where multi step outperforms one step

Neural Information Processing SystemsApr-25-2026, 04:38:49 GMT

As explained in the main text, this section presents an example that is only a slight modification of the one in Figure 4, but where a multi-step approach is clearly preferred over just one step. The data-generating and learning processes are exactly the same (100 trajectories of length 100, discount 0.9, α = 0.1for reverse KL regularization). The only difference is that rather than using a behavior that is a mixture of optimal and uniform, we use a behavior that is a mixture of maximally suboptimal and uniform. If we call the suboptimal policy π (which always goes down and left in our gridworld), then the behavior for the modified example is β = 0.2 π +0.8 u, where uis uniform. Results are shown in Figure 7. Figure 7: A gridworld example with modified behavior where multi-step is much better than one-step.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

204904e461002b28511d5880e1c36a0f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 01:33:23 GMT

Similarly to [6], we consider that all environments have the same underlying Structural Causal Model (SCM) and that the different environments correspond to different interventions on the SCM. We provide here the formal definition for SCMs and interventions. We say that Xi causes Xj if Xi 2Pa(Xj). Definition A.2. (Intervention) [6]: Consider a SCMC =( S,N). An intervention e on C consists of replacing one or several of its structural equations to obtain an intervened SCMCe =( Se,N e) with structural equations: Sej: Xej fj(Pa(Xej),N ej), for j =1,...m (11) The variable Xe is intervened on if Si 6= Sei or Ni 6= Nei .

artificial intelligence, different environment, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: